Exact alignment recovery for correlated Erdos Renyi graphs

نویسندگان

  • Daniel Cullina
  • Negar Kiyavash
چکیده

We consider the problem of perfectly recovering the vertex correspondence between two correlated Erdős-Rényi (ER) graphs on the same vertex set. The correspondence between the vertices can be obscured by randomly permuting the vertex labels of one of the graphs. We determine the information-theoretic threshold for exact recovery, i.e. the conditions under which the entire vertex correspondence can be correctly recovered given unbounded computational resources. Graph alignment is the problem finding a matching between the vertices of the two graphs that matches, or aligns, many edges of the first graph with edges of the second graph. Alignment is a generalization of graph isomorphism recovery to non-isomorphic graphs. Graph alignment can be applied in the deanonymization of social networks, the analysis of protein interaction networks, and computer vision. Narayanan and Shmatikov successfully deanonymized an anonymized social network dataset by graph alignment with a publicly available network [1]. In order to make privacy guarantees in this setting, it is necessary to understand the conditions under which graph alignment recovery is possible. We consider graph alignment for a randomized graphpair model. This generation procedure creates a “planted” alignment: there is a ground-truth relationship between the vertices of the two graphs. Pedarsani and Grossglauser [2] were the first to approach the problem of finding informationtheoretic conditions for alignment recovery. They established conditions under which exact recovery of the planted alignment is possible, The authors improved on these conditions and also established conditions under which exact recover is impossible [3]. In this paper, we close the gap between these results and establish the precise threshold for exact recovery in sparse graphs. As a special case, we recover a result of Wright [4] about the conditions under which an Erdős-Rényi graph has a trivial automorphism group. I. MODEL A. The alignment recovery problem We consider the following problem. There are two correlated graphs Ga and Gb, both on the vertex set [n] = {0, 1, . . . , n−1}. By correlation we mean that for each vertex pair e, presence or absence of e ∈ E(Ga), or equivalently the indicator variable Ga(e), provides some information about Gb(e). The true vertex labels of Ga are removed and replaced with meaningless labels. We model this by applying a uniformly random permutation Π to map the vertices of Ga to the vertices of its anonymized version. The anonymized graph is Gc, where Gc({Π(i),Π(j)}) = Ga({i, j}) for all i, j ∈ [n], i 6= j. The original vertex labels of Gb are preserved and Gc and Gb are revealed. We would like to know under what conditions it is possible to discover the true correspondence between the vertices of Ga and the vertices of Gb. In other words, when can the random permutation Π be exactly recovered with high probability? In this context, an achievability result demonstrates the existence of an algorithm or estimator that exactly recovers Π with high probability. A converse result is an upper bound on the probability of exact recovery that applies to any estimator. B. Correlated Erdős-Rényi graphs To fully specify this problem, we need to define a joint distribution over Ga and Gb. In this paper, we will focus on Erdős-Rényi (ER) graphs. We discuss some of the advantages and drawbacks of this model in Section II-F. We will generate correlated Erdős-Rényi graphs as follows. Let Ga and Gb be graphs on the vertex set [n]. We will think of (Ga, Gb) as a single function with codomain {0, 1}2: (Ga, Gb)(e) = (Ga(e), Gb(e)). The random variables (Ga, Gb)(e), e ∈ ( [n] 2 ) , are i.i.d. and (Ga, Gb)(e) =    (1, 1) w.p. p11 (1, 0) w.p. p10 (0, 1) w.p. p01 (0, 0) w.p. p00. Call this distribution ER(n,p), where p = (p11, p10, p01, p00). Note that the marginal distributions of Ga and Gb are Erdős-Rényi and so is the distribution of the intersection graph Ga ∧ Gb: Ga ∼ ER(n, p10 + p11), Gb ∼ ER(n, p01 + p11), and Ga ∧Gb ∼ ER(n, p11). When p11 > (p10+ p11)(p01+ p11), we say that the graphs Ga and Gb have positive correlation. Observe that p11 − (p10 + p11)(p01 + p11) = p11p00 − p01p10 so p11p00 > p10p01 is an equivalent, more symmetric condition for positive correlation. C. Results All of the results concern the following setting. We have (Ga, Gb) ∼ ER(n,p), Π is a uniformly random permutation of [n] independent of (Ga, Gb), and Gc is the anonymization of Ga by Π as described in Section I-A. Our main result is the following.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The hare and the tortoise: the network structure of exploration and exploitation

We found a curvilinear relationship between density and final score. Moderately low density measures (p=.04-6) outperformed very low density graphs and higher density graphs. Graphs with very low density often contain isolated cliques that reduce average score. If all nodes are not connected, then they cannot share the best solution found, lowering the optimal score. Eliminating all multi-cliqu...

متن کامل

An Effective Comparison of Graph Clustering Algorithms via Random Graphs

Many graph clustering algorithms have been proposed in recent past researches, each algorithm having its own advantages and drawbacks. All these algorithms rely on a very different approach so it’s really hard to say that which one is the most efficient and optimal if we talk in the sense of performance. It is really hard to decide that which algorithm is beneficial in case of highly complex ne...

متن کامل

Exact learning curves for Gaussian process regression on community random graphs

We study learning curves for Gaussian process regression which characterise performance in terms of the Bayes error averaged over datasets of a given size. Whilst learning curves are in general very difficult to calculate we show that for discrete input domains, where similarity between input points is characterized in terms nodes on a graph, accurate predictions can be obtained. These should i...

متن کامل

Exact learning curves for Gaussian process regression on large random graphs

We study learning curves for Gaussian process regression which characterise performance in terms of the Bayes error averaged over datasets of a given size. Whilst learning curves are in general very difficult to calculate we show that for discrete input domains, where similarity between input points is characterised in terms of a graph, accurate predictions can be obtained. These should in fact...

متن کامل

Strongly balanced graphs and random graphs

The concept of strongly balanced graph is introduced. It is shown that there exists a strongly balanced graph with u vertices and e edges if and only if I s u 1 s e s (2"). This result is applied to a classic question of Erdos and Renyi: What is the probability that a random graph on n vertices contains a given graph? A rooted version of this problem is also solved.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1711.06783  شماره 

صفحات  -

تاریخ انتشار 2017